Pruned Random Subspace Method for One-Class Classifiers
نویسندگان
چکیده
The goal of one-class classification is to distinguish the target class from all the other classes using only training data from the target class. Because it is difficult for a single one-class classifier to capture all the characteristics of the target class, combining several one-class classifiers may be required. Previous research has shown that the Random Subspace Method (RSM), in which classifiers are trained on different subsets of the feature space, can be effective for one-class classifiers. In this paper we show that the performance by the RSM can be noisy, and that pruning inaccurate classifiers from the ensemble can be more effective than using all available classifiers. We propose to apply pruning to RSM of one-class classifiers using a supervised AUC criterion or an unsupervised consistency criterion. It appears that when the AUC criterion is used, the performance may be increased dramatically, while for the consistency criterion results do not improve, but only become more predictable.
منابع مشابه
Random Subspace Method with Feature Subsets Selected by a Fuzzy Class Separability Index
Classifier combining techniques have become popular for improving weak classifiers in recent years. The random subspace method (RSM) is an efficient classifier combining technique that can improve classification performance of weak classifiers for the small sample size (SSS) problems. In RSM, the feature subsets are randomly selected and the resulting datasets are used to train classifiers. How...
متن کاملA Genetic Algorithm-Based Heterogeneous Random Subspace Ensemble Model for Bankruptcy Prediction
Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble techniques are known to be very useful in improving the generalization ability of a classifier. The random subspace ensemble technique is a simple but effective method of constructing ensemble classifiers, in which some features are randomly d...
متن کاملA Novel Random Subspace Method for Online Writeprint Identification
With the widespread application of computer network technology, diverse anonymous cyber crimes begin to appear in the online community. The anonymous nature of online-information distribution makes writeprint identification a critical forensic problem. But the difficulty of the task is the huge number of features in even a moderate-sized available text corpus, which causes the problem of over-t...
متن کاملForesTexter: An efficient random forest algorithm for imbalanced text categorization
In this paper, we propose a new Random Forest (RF) based ensemble method, ForesTexter, to solve the imbalanced text categorization problems. RF has shown great success in many real-world applications. However, the problem of learning from text data with class imbalance is a relatively new challenge that needs to be addressed. A RF algorithm tends to use a simple random sampling of features in b...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کامل